Enhancing Data Migration Performance via Parallel Data Compression

نویسندگان

  • Jonghyun Lee
  • Marianne Winslett
  • Xiaosong Ma
  • Shengke Yu
چکیده

Scientific simulations often produce large volumes of output that are moved to another platform for visualization or storage. This long-distance migration is slow due to the data size and slow network. Compression can improve migration performance by reducing the data size, but compression is computation-intensive and so can raise costs. In this work, we show how to reduce data migration cost by incorporating compression into migration. We analyze eight scientific data sets, and propose three approaches for parallel compression of scientific data. Our results show that with reasonably fast processors and typical parallel configurations, the compression cost for large scientific data is outweighed by the performance gain obtained by migrating less data. We found that a client-side compression approach (CC) can improve I/O and migration performance by an order of magnitude. In our experiments, CC always matches or outperforms migration without compression when we overlap migration with computation, even for not very compressible dense floating point data. We also present a variant of CC that is well suited for use with implementations of two-phase I/O.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining

This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...

متن کامل

Optimizing and Enhancing Parallel Multi Storage Backup Compression for Real-time Database Systems

One of the big challenges in the world was the amount of data being stored, especially in Data Warehouses. Data stored in databases keep growing as a result of businesses requirements for more information. A big portion of the cost of keeping large amounts of data is in the cost of disk systems, and the resources utilized in managing the data. Backup Compression field in the database systems ha...

متن کامل

Reducing I/O Load in Parallel RDF Systems via Data Compression

The amount of RDF data published to the web is rapidly growing which has led to an increase in research of systems for handling such vast amounts of data. Employing parallelism has been a common approach, for which parallel I/O of RDF data can be very time-consuming. To reduce I/O load without requiring preprocessing, we propose a syntactic subset of the Turtle syntax called Sterno which is ame...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002